Let's say we want to prepare data and try some scalers and classifiers for prediction in a classification problem. We will tune paramaters of classifiers by grid search technique.
Data preparing:
In [1]:
from sklearn.datasets import make_classification
X, y = make_classification()
Setting steps for our pipelines and parameters for grid search:
In [2]:
from reskit.core import Pipeliner
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import MinMaxScaler
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
classifiers = [('LR', LogisticRegression()),
('SVC', SVC())]
scalers = [('standard', StandardScaler()),
('minmax', MinMaxScaler())]
steps = [('scaler', scalers),
('classifier', classifiers)]
param_grid = {'LR': {'penalty': ['l1', 'l2']},
'SVC': {'kernel': ['linear', 'poly', 'rbf', 'sigmoid']}}
Setting a cross-validation for grid searching of hyperparameters and for evaluation of models with obtained hyperparameters.
In [3]:
from sklearn.model_selection import StratifiedKFold
grid_cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=0)
eval_cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=1)
Creating a plan of our research:
In [4]:
pipe = Pipeliner(steps=steps, grid_cv=grid_cv, eval_cv=eval_cv, param_grid=param_grid)
pipe.plan_table
Out[4]:
To tune parameters of models and evaluate this models, run:
In [5]:
pipe.get_results(X, y, scoring=['roc_auc'])
Out[5]: